Goto

Collaborating Authors

 immediate feedback




Humans can learn to detect AI-generated texts, or at least learn when they can't

Milička, Jiří, Marklová, Anna, Drobil, Ondřej, Pospíšilová, Eva

arXiv.org Artificial Intelligence

This study investigates whether individuals can learn to accurately discriminate between human-written and AI-produced texts when provided with immediate feedback, and if they can use this feedback to recalibrate their self-perceived competence. We also explore the specific criteria individuals rely upon when making these decisions, focusing on textual style and perceived readability. We used GPT-4o to generate several hundred texts across various genres and text types comparable to Koditex, a multi-register corpus of human-written texts. We then presented randomized text pairs to 254 Czech native speakers who identified which text was human-written and which was AI-generated. Participants were randomly assigned to two conditions: one receiving immediate feedback after each trial, the other receiving no feedback until experiment completion. We recorded accuracy in identification, confidence levels, response times, and judgments about text readability along with demographic data and participants' engagement with AI technologies prior to the experiment. Participants receiving immediate feedback showed significant improvement in accuracy and confidence calibration. Participants initially held incorrect assumptions about AI-generated text features, including expectations about stylistic rigidity and readability. Notably, without feedback, participants made the most errors precisely when feeling most confident -- an issue largely resolved among the feedback group. The ability to differentiate between human and AI-generated texts can be effectively learned through targeted training with explicit feedback, which helps correct misconceptions about AI stylistic features and readability, as well as potential other variables that were not explored, while facilitating more accurate self-assessment. This finding might be particularly important in educational contexts.




The way we train AIs makes them more likely to spout bull

New Scientist

Common methods used to train artificial intelligence models seem to increase their tendency to give misleading answers, according to researchers who are aiming to produce "the first systematic analysis of machine bullshit". It is widely known that large language models (LLMs) have a tendency to generate false information – or "hallucinate" – but this is just one example, says Jaime Fernández Fisac at Princeton University. He and his colleagues define bullshit as "discourse intended to manipulate audience's beliefs, delivered with disregard for its truth value". "Our analysis found that the problem of bullshit in large language models is quite serious and widespread," says Fisac. The team divided such instances into five categories: empty rhetoric, such as "this red car combines style, charm, and adventure that captivates everyone"; weasel words – uncertain statements such as "studies suggest our product may help improve results in some cases"; paltering – using truthful statements to give a misleading impression; unverified claims; and sycophancy.


Review for NeurIPS paper: Model Selection for Production System via Automated Online Experiments

Neural Information Processing Systems

Summary and Contributions: The paper proposes a model selection algorithm called Model Selection with Automated Online Experiments (AOE) that is designed for use in production systems. In the problem statement, it is stated that the goal of the model selection problem is to select the model from a set of candidate models that maximises a metric of interest. It is assumed that the metric of interest can be expressed as the average immediate feedback from each of a model's predictions. AOE uses both historical log data and data collected from a small budget of online experiments to inform the choice of model. A distribution for the accumulative metric, or expected immediate feedback, is derived.


RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation

Liang, Kaiqu, Hu, Haimin, Liu, Ryan, Griffiths, Thomas L., Fisac, Jaime Fernández

arXiv.org Artificial Intelligence

Generative AI systems like foundation models (FMs) must align well with human values to ensure their behavior is helpful and trustworthy. While Reinforcement Learning from Human Feedback (RLHF) has shown promise for optimizing model performance using human judgments, existing RLHF pipelines predominantly rely on immediate feedback, which can fail to accurately reflect the downstream impact of an interaction on users' utility. We demonstrate that feedback based on evaluators' foresight estimates of downstream consequences systematically induces Goodhart's Law dynamics, incentivizing misaligned behaviors like sycophancy and deception and ultimately degrading user outcomes. To alleviate this, we propose decoupling evaluation from prediction by refocusing RLHF on hindsight feedback. Our theoretical analysis reveals that conditioning evaluator feedback on downstream observations mitigates misalignment and improves expected human utility, even when these observations are simulated by the AI system itself. To leverage this insight in a practical alignment algorithm, we introduce Reinforcement Learning from Hindsight Simulation (RLHS), which first simulates plausible consequences and then elicits feedback to assess what behaviors were genuinely beneficial in hindsight. We apply RLHS to two widely-employed online and offline preference optimization methods -- Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO) -- and show empirically that misalignment is significantly reduced with both methods. Through an online human user study, we show that RLHS consistently outperforms RLHF in helping users achieve their goals and earns higher satisfaction ratings, despite being trained solely with simulated hindsight feedback. These results underscore the importance of focusing on long-term consequences, even simulated ones, to mitigate misalignment in RLHF.


Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

Zhang, Peiyan, Jin, Haibo, Hu, Leyang, Li, Xinnuo, Kang, Liying, Luo, Man, Song, Yangqiu, Wang, Haohan

arXiv.org Artificial Intelligence

Recent advancements in large language models (LLMs) have significantly enhanced the ability of LLM-based systems to perform complex tasks through natural language processing and tool interaction. However, optimizing these LLM-based systems for specific tasks remains challenging, often requiring manual interventions like prompt engineering and hyperparameter tuning. Existing automatic optimization methods, such as textual feedback-based techniques (e.g., TextGrad), tend to focus on immediate feedback, analogous to using immediate derivatives in traditional numerical gradient descent. However, relying solely on such feedback can be limited when the adjustments made in response to this feedback are either too small or fluctuate irregularly, potentially slowing down or even stalling the optimization process. To overcome these challenges, more adaptive methods are needed, especially in situations where the system's response is evolving slowly or unpredictably. In this paper, we introduce REVOLVE, an optimization method that tracks how "R"esponses "EVOLVE" across iterations in LLM systems. By focusing on the evolution of responses over time, REVOLVE enables more stable and effective optimization by making thoughtful, progressive adjustments at each step. Experimental results demonstrate that REVOLVE outperforms competitive baselines, achieving a 7.8% improvement in prompt optimization, a 20.72% gain in solution refinement, and a 29.17% increase in code optimization. Additionally, REVOLVE converges in fewer iterations, resulting in significant computational savings. These advantages highlight its adaptability and efficiency, positioning REVOLVE as a valuable tool for optimizing LLM-based systems and accelerating the development of next-generation AI technologies. Code is available at: https://github.com/Peiyance/REVOLVE.


Large Language Model-based System to Provide Immediate Feedback to Students in Flipped Classroom Preparation Learning

Uchiyama, Shintaro, Umemura, Kyoji, Morita, Yusuke

arXiv.org Artificial Intelligence

This paper proposes a system that uses large language models to provide immediate feedback to students in flipped classroom preparation learning. This study aimed to solve challenges in the flipped classroom model, such as ensuring that students are emotionally engaged and motivated to learn. Students often have questions about the content of lecture videos in the preparation of flipped classrooms, but it is difficult for teachers to answer them immediately. The proposed system was developed using the ChatGPT API on a video-watching support system for preparation learning that is being used in real practice. Answers from ChatGPT often do not align with the context of the student's question. Therefore, this paper also proposes a method to align the answer with the context. This paper also proposes a method to collect the teacher's answers to the students' questions and use them as additional guides for the students. This paper discusses the design and implementation of the proposed system.